Code Tiling for Improving the Cache Performance of PDE Solvers

نویسندگان

Qingguang Huang

Jingling Xue

Xavier Vera

چکیده

For SOR-like PDE solvers, loop tiling either helps little in improving data locality or hurts their performance. This paper presents a novel compiler technique called code tiling for generating fast tiled codes for these solvers on uniprocessors with a memory hierarchy. Code tiling combines loop tiling with a new array layout transformation called data tiling in such a way that a significant amount of cache misses that would otherwise be present in tiled codes are eliminated. Compared to nine existing loop tiling algorithms, our technique delivers impressive performance speedups (faster by factors of 1.55 – 2.62) and smooth performance curves across a range of problem sizes on representative machine architectures. The synergy of loop tiling and data tiling allows us to find a problem-size-independent tile size that minimises a cache miss objective function independently of the problem size parameters. This “one-size-fits-all” scheme makes our approach attractive for designing fast SOR solvers without having to generate a multitude of versions specialised for different problem sizes.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Software Support For Improving Locality in Scientific Codes

We propose to develop and evaluate software support for improving locality for advanced scientific applications. We will investigate compiler and run-time techniques needed to achieve high performance on both sequential and parallel machines. We will focus on two areas. First, iterative PDE solvers for 3D partial differential equations have poor locality because accesses to nearby elements in h...

متن کامل

Analyzing Advanced PDE Solvers Through Simulation

By simulating a real computer it is possible to gain a detailed knowledge of the cache memory utilization of an application, e.g., a partial differential equation (PDE) solver. Using this knowledge, we can discover regions with intricate cache memory performance. Furthermore, this information makes it possible to identify performance bottlenecks. In this paper, we employ full system simulation ...

متن کامل

Fast, Adaptively Refined Computational Elements in 3D

We describe a multilevel adaptive grid refinement package designed to provide a high performance, serial or parallel patch class for use in PDE solvers. We provide a high level description algorithmically with mathematical motivation. The C++ code uses cache aware data structures and automatically load balances.

متن کامل

Interference Lattice-based Loop Nest Tilings for Stencil Computations

A common method for improving performance of stencil operations on structured multi-dimensional discretization grids is loop tiling. Tile shapes and sizes are usually determined heuristically, based on the size of the primary data cache. We provide a lower bound on the numbers of cache misses that must be incurred by any tiling, and a close achievable bound using a particular tiling based on th...

متن کامل

Performance Modelling for Parallel PDE Solvers on NUMA-Systems

A detailed model of the memory performance of a PDE solver running on a NUMA-system is set up. Due to the complexity of modern computers, such a detailed model inevitably is very complicated. Therefore, approximations are introduced that simplify the model and allows NUMA-systems and PDE solvers to be described conveniently. Using the simpli ed model, it is shown that PDE solvers using ordered ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Code Tiling for Improving the Cache Performance of PDE Solvers

نویسندگان

چکیده

منابع مشابه

Software Support For Improving Locality in Scientific Codes

Analyzing Advanced PDE Solvers Through Simulation

Fast, Adaptively Refined Computational Elements in 3D

Interference Lattice-based Loop Nest Tilings for Stencil Computations

Performance Modelling for Parallel PDE Solvers on NUMA-Systems

عنوان ژورنال:

اشتراک گذاری